Skip to content

Add non-power-of-2 shapes for Morton coding to benchmarks#3717

Merged
d-v-b merged 5 commits intozarr-developers:mainfrom
mkitti:mkitti-morton-benchmarks
Feb 24, 2026
Merged

Add non-power-of-2 shapes for Morton coding to benchmarks#3717
d-v-b merged 5 commits intozarr-developers:mainfrom
mkitti:mkitti-morton-benchmarks

Conversation

@mkitti
Copy link
Contributor

@mkitti mkitti commented Feb 20, 2026

  • tests: Add non-power-of-2 shard shapes to
    benchmarks
  • tests: Add near-miss power-of-2 shape (33

[Description of PR]

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

mkitti and others added 2 commits February 20, 2026 16:27
Add (30,30,30) to large_morton_shards and (10,10,10), (20,20,20),
(30,30,30) to morton_iter_shapes to benchmark the scalar fallback path
for non-power-of-2 shapes, which are not fully covered by the vectorized
hypercube path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the performance penalty when a shard shape is just above a
power-of-2 boundary, causing n_z to jump from 32,768 to 262,144.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 20, 2026
@mkitti
Copy link
Contributor Author

mkitti commented Feb 20, 2026

Benchmark Results

These benchmarks were run on this branch (which includes the vectorized get_chunk_slice from #3713) to characterize Morton order performance across power-of-2 and non-power-of-2 shard shapes.

test_morton_order_iter — pure Morton computation, no I/O, LRU cache cleared each round

Shape Elements Type Mean time
(8,8,8) 512 power-of-2 0.45 ms
(16,16,16) 4,096 power-of-2 3.6 ms
(32,32,32) 32,768 power-of-2 28.9 ms
(10,10,10) 1,000 non-power-of-2 9.6 ms
(20,20,20) 8,000 non-power-of-2 88.2 ms
(30,30,30) 27,000 non-power-of-2 125.6 ms
(33,33,33) 35,937 near-miss (+1 above 32³) 767 ms

The near-miss penalty is striking: (33,33,33) has only ~10% more elements than (32,32,32) but takes 27× longer. This is because the current floor-hypercube approach must scalar-decode many Morton codes beyond the guaranteed in-bounds region.

test_sharded_morton_write_single_chunk — write 1 chunk to a large shard, cache cleared each round

Shape Chunks/shard Mean time
(32,32,32) 32,768 35.7 ms
(30,30,30) 27,000 127.5 ms
(33,33,33) 35,937 767.8 ms

test_sharded_morton_single_chunk — read 1 chunk from a large shard (cached after first access)

Shape Mean time
(32,32,32) 0.73 ms
(30,30,30) 0.69 ms
(33,33,33) 0.71 ms

Reads are fast across all shapes once the Morton order cache is warm (the first call pays the penalty, subsequent reads are cached).

Interpretation

The benchmarks confirm that non-power-of-2 shard shapes carry a significant Morton computation penalty under the current implementation, with near-miss shapes (like (33,33,33)) being especially slow. These benchmarks provide a baseline to measure improvements from follow-on optimization work.

mkitti and others added 2 commits February 20, 2026 16:48
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Feb 21, 2026
@d-v-b d-v-b enabled auto-merge (squash) February 24, 2026 13:41
@d-v-b d-v-b merged commit 32c7ab9 into zarr-developers:main Feb 24, 2026
25 checks passed
d-v-b added a commit that referenced this pull request Feb 25, 2026
…ort strategy (#3718)

* tests: Add non-power-of-2 shard shapes to benchmarks

Add (30,30,30) to large_morton_shards and (10,10,10), (20,20,20),
(30,30,30) to morton_iter_shapes to benchmark the scalar fallback path
for non-power-of-2 shapes, which are not fully covered by the vectorized
hypercube path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* tests: Add near-miss power-of-2 shape (33,33,33) to benchmarks

Documents the performance penalty when a shard shape is just above a
power-of-2 boundary, causing n_z to jump from 32,768 to 262,144.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: Apply ruff format to benchmark file

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* changes: Add changelog entry for PR #3717

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* perf: Fix near-miss penalty in _morton_order with hybrid ceiling+argsort strategy

For shapes just above a power-of-2 (e.g. (33,33,33)), the ceiling-only
approach generates n_z=262,144 Morton codes for only 35,937 valid
coordinates (7.3× overgeneration). The floor+scalar approach is even
worse since the scalar loop iterates n_z-n_floor times (229,376 for
(33,33,33)), not n_total-n_floor.

The fix: when n_z > 4*n_total, use an argsort strategy that enumerates
all n_total valid coordinates via meshgrid, encodes each to a Morton code
using vectorized bit manipulation, then sorts by Morton code. This avoids
the large overgeneration while remaining fully vectorized.

Result for test_morton_order_iter:
  (30,30,30): 24ms  (ceiling, ratio=1.21)
  (32,32,32): 28ms  (ceiling, ratio=1.00)
  (33,33,33): 32ms  (argsort, ratio=7.3 → fixed from ~820ms with scalar)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: Address pre-commit CI failures in _morton_order

- Replace Unicode multiplication sign × with ASCII x in comment (RUF003)
- Add explicit type annotation for np.argsort result to satisfy mypy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: Cast argsort result via np.asarray to resolve mypy no-any-return

np.stack returns Any in mypy's view, so indexing into it also returns
Any. Using np.asarray(..., dtype=np.intp) makes the type explicit and
avoids the no-any-return error at the return site.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: Pre-declare order type to resolve mypy no-any-return in _morton_order

np.asarray and np.stack return Any with numpy 2.1 type stubs, causing
mypy to infer the return type as Any. Pre-declaring order as
npt.NDArray[np.intp] before the if/else makes the intended type explicit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Davis Bennett <davis.v.bennett@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants